Skip to content

Conversation

@maltesander
Copy link
Member

@maltesander maltesander commented Nov 4, 2025

Description

2025-11-03 13:03:58,809 INFO  ha.ActiveStandbyElector (ActiveStandbyElector.java:processWatchEvent(649)) - Session connected.
2025-11-03 13:03:58,810 INFO  ha.ActiveStandbyElector (ActiveStandbyElector.java:fenceOldActive(1019)) - Checking for any old active which needs to be fenced...
2025-11-03 13:03:58,811 INFO  ha.ActiveStandbyElector (ActiveStandbyElector.java:fenceOldActive(1040)) - Old node exists: 0a04686466731217686466732d6e616d656e6f64652d64656661756c742d301a47686466732d6e616d656e6f64652d64656661756c742d302e686466732d6e616d656e6f64652d64656661756c742e64656661756c742e7376632e636c75737465722e6c6f63616c20d43e28d33e
2025-11-03 13:03:58,812 WARN  ha.ActiveStandbyElector (ActiveStandbyElector.java:becomeActive(952)) - Exception handling the winning of election
java.lang.RuntimeException: Mismatched address stored in ZK for NameNode at hdfs-namenode-default-0.hdfs-namenode-default-headless.default.svc.cluster.local/100.97.191.176:8020: Stored protobuf was nameserviceId: "hdfs"
namenodeId: "hdfs-namenode-default-0"
hostname: "hdfs-namenode-default-0.hdfs-namenode-default.default.svc.cluster.local"
port: 8020
zkfcPort: 8019
, address from our own configuration for this NameNode was hdfs-namenode-default-0.hdfs-namenode-default-headless.default.svc.cluster.local/100.97.191.176:8020
        at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.dataToTarget(DFSZKFailoverController.java:91)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:533)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:65)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:973)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:1044)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:943)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:509)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:675)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:554)
2025-11-03 13:03:58,812 INFO  ha.ActiveStandbyElector (ActiveStandbyElector.java:reJoinElection(799)) - Trying to re-establish ZK session
2025-11-03 13:03:58,916 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(1232)) - Session: 0x100088e24760797 closed
2025-11-03 13:03:59,917 INFO  zookeeper.ZooKeeper (ZooKeeper.java:<init>(637)) - Initiating client connection, connectString=zookeeper-server.default.svc.cluster.local:2282/znode-2aa300aa-3042-43ca-8076-4a224c72dea6 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@6cdbb1a5
2025-11-03 13:03:59,917 INFO  zookeeper.ClientCnxnSocket (ClientCnxnSocket.java:initProperties(239)) - jute.maxbuffer value is 1048575 Bytes
2025-11-03 13:03:59,917 INFO  zookeeper.ClientCnxn (ClientCnxn.java:initRequestTimeout(1747)) - zookeeper.request.timeout value is 0. feature enabled=false
2025-11-03 13:03:59,917 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1177)) - Opening socket connection to server zookeeper-server.default.svc.cluster.local/100.64.47.126:2282.
2025-11-03 13:03:59,918 INFO  zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1179)) - SASL config status: Will not attempt to authenticate using SASL (unknown error)
2025-11-03 13:03:59,918 INFO  zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(1013)) - Socket connection established, initiating session, client: /100.97.191.176:34248, server: zookeeper-server.default.svc.cluster.local/100.64.47.126:2282
2025-11-03 13:03:59,926 INFO  zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1453)) - Session establishment complete on server zookeeper-server.default.svc.cluster.local/100.64.47.126:2282, session id = 0x100088e24760799, negotiated timeout = 10000
2025-11-03 13:03:59,926 WARN  ha.ActiveStandbyElector (ActiveStandbyElector.java:isStaleClient(1176)) - Ignoring stale result from old client with sessionId 0x100088e24760797
2025-11-03 13:03:59,926 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(569)) - EventThread shut down for session: 0x100088e24760797

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Helm chart can be installed and deployed operator works
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible
  • Links to generated (nightly) docs added
  • Release note snippet added

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added
  • Links to generated (nightly) docs added
  • Release note snippet added
  • Add type/deprecation label & add to the deprecation schedule
  • Add type/experimental label & add to the experimental features tracker

sbernauer
sbernauer previously approved these changes Nov 4, 2025
Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not test it but LGTM, thanks!

@sbernauer sbernauer moved this to Development: In Review in Stackable Engineering Nov 4, 2025
@sbernauer sbernauer changed the title revert/remove headless suffix from headless service fix: Revert/remove headless suffix from headless service Nov 4, 2025
@sbernauer sbernauer enabled auto-merge November 4, 2025 12:19
@sbernauer sbernauer moved this from Development: In Review to Development: Done in Stackable Engineering Nov 4, 2025
@sbernauer sbernauer added this pull request to the merge queue Nov 4, 2025
Merged via the queue into main with commit 3238cd1 Nov 4, 2025
17 checks passed
@sbernauer sbernauer deleted the fix/revert-headless-service-name branch November 4, 2025 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Development: Done

Development

Successfully merging this pull request may close these issues.

4 participants